Several self-supervised representation learning methods have been proposed for reinforcement learning (RL) with rich observations. For real-world applications of RL, recovering underlying latent states is crucial, particularly when sensory inputs contain irrelevant and exogenous information. In this work, we study how information bottlenecks can be used to construct latent states efficiently in the presence of task-irrelevant information. We propose architectures that utilize variational and discrete information bottlenecks, coined as RepDIB, to learn structured factorized representations. Exploiting the expressiveness bought by factorized representations, we introduce a simple, yet effective, bottleneck that can be integrated with any existing self-supervised objective for RL. We demonstrate this across several online and offline RL benchmarks, along with a real robot arm task, where we find that compressed representations with RepDIB can lead to strong performance improvements, as the learned bottlenecks help predict only the relevant state while ignoring irrelevant information.
translated by 谷歌翻译
一个沿着城市街道行走的人试图对世界各个方面进行建模,这很快就会被许多商店,汽车和人们遵循自己的复杂且难以理解的动态所淹没。在这种环境中的探索和导航是一项日常任务,不需要大量精神资源。是否可以将这种感官信息的消防软管转变为最小的潜在状态,这是代理在世界上成功采取行动的必要和足够的?我们具体地提出了这个问题,并提出了可控制的状态发现算法(AC-State),该算法具有理论保证,并且实际上被证明可以发现\ textit {最小可控的潜在状态},其中包含所有用于控制控制的信息代理,同时完全丢弃所有无关的信息。该算法由一个具有信息瓶颈的多步逆模型(预测遥远观察结果的动作)组成。 AC-State可以在没有奖励或示威的情况下实现本地化,探索和导航。我们证明了在三个领域中发现可控潜在状态的发现:将机器人组分散注意力(例如,照明条件和背景变化),与其他代理商一起在迷宫中进行探索,并在Matterport House Simulator中导航。
translated by 谷歌翻译
We address the problem of few-shot classification where the goal is to learn a classifier from a limited set of samples. While data-driven learning is shown to be effective in various applications, learning from less data still remains challenging. To address this challenge, existing approaches consider various data augmentation techniques for increasing the number of training samples. Pseudo-labeling is commonly used in a few-shot setup, where approximate labels are estimated for a large set of unlabeled images. We propose DiffAlign which focuses on generating images from class labels. Specifically, we leverage the recent success of the generative models (e.g., DALL-E and diffusion models) that can generate realistic images from texts. However, naive learning on synthetic images is not adequate due to the domain gap between real and synthetic images. Thus, we employ a maximum mean discrepancy (MMD) loss to align the synthetic images to the real images minimizing the domain gap. We evaluate our method on the standard few-shot classification benchmarks: CIFAR-FS, FC100, miniImageNet, tieredImageNet and a cross-domain few-shot classification benchmark: miniImageNet to CUB. The proposed approach significantly outperforms the stateof-the-art in both 5-shot and 1-shot setups on these benchmarks. Our approach is also shown to be effective in the zero-shot classification setup
translated by 谷歌翻译
Soft actuators have attracted a great deal of interest in the context of rehabilitative and assistive robots for increasing safety and lowering costs as compared to rigid-body robotic systems. During actuation, soft actuators experience high levels of deformation, which can lead to microscale fractures in their elastomeric structure, which fatigues the system over time and eventually leads to macroscale damages and eventually failure. This paper reports finite element modeling (FEM) of pneu-nets at high angles, along with repetitive experimentation at high deformation rates, in order to study the effect and behavior of fatigue in soft robotic actuators, which would result in deviation from the ideal behavior. Comparing the FEM model and experimental data, we show that FEM can model the performance of the actuator before fatigue to a bending angle of 167 degrees with ~96% accuracy. We also show that the FEM model performance will drop to 80% due to fatigue after repetitive high-angle bending. The results of this paper objectively highlight the emergence of fatigue over cyclic activation of the system and the resulting deviation from the computational FEM model. Such behavior can be considered in future controllers to adapt the system with time-variable and non-autonomous response dynamics of soft robots.
translated by 谷歌翻译
我们提出了一种新颖的方法,可以将3D人类动画放入3D场景中,同时保持动画中的任何人类场景相互作用。我们使用计算动画中最重要的网格的概念,以与场景进行交互,我们称之为“键框”。这些关键框架使我们能够更好地优化动画在场景中的位置,从而使动画中的互动(站立,铺设,坐着等)与场景的负担相匹配(例如,站在地板上或躺在床上)。我们将我们称为PAAK的方法与先前的方法进行了比较,包括POSA,Prox地面真理和运动合成方法,并通过感知研究突出了我们方法的好处。人类评估者更喜欢我们的PAAK方法,而不是Prox地面真相数据64.6 \%。此外,在直接比较中,与POSA相比,评估者比竞争方法比包括61.5%的竞争方法更喜欢PAAK。
translated by 谷歌翻译
我们在定期马尔可夫决策过程(MDP)中学习学习,这是一种特殊类型的非平稳MDP,在平均奖励最大化设置下,状态过渡概率和奖励功能都定期变化。我们通过使用周期指数来扩大状态空间来将问题作为固定的MDP提出,并提出了定期上限置信度结合增强学习2(PUCRL2)算法。我们表明,pucrl2的遗憾随着时期和地平线长度的次线性而变化。数值结果证明了PUCRL2的功效。
translated by 谷歌翻译
Computational imaging has been revolutionized by compressed sensing algorithms, which offer guaranteed uniqueness, convergence, and stability properties. In recent years, model-based deep learning methods that combine imaging physics with learned regularization priors have been emerging as more powerful alternatives for image recovery. The main focus of this paper is to introduce a memory efficient model-based algorithm with similar theoretical guarantees as CS methods. The proposed iterative algorithm alternates between a gradient descent involving the score function and a conjugate gradient algorithm to encourage data consistency. The score function is modeled as a monotone convolutional neural network. Our analysis shows that the monotone constraint is necessary and sufficient to enforce the uniqueness of the fixed point in arbitrary inverse problems. In addition, it also guarantees the convergence to a fixed point, which is robust to input perturbations. Current algorithms including RED and MoDL are special cases of the proposed algorithm; the proposed theoretical tools enable the optimization of the framework for the deep equilibrium setting. The proposed deep equilibrium formulation is significantly more memory efficient than unrolled methods, which allows us to apply it to 3D or 2D+time problems that current unrolled algorithms cannot handle.
translated by 谷歌翻译
Exposure to ideas in domains outside a scientist's own may benefit her in reformulating existing research problems in novel ways and discovering new application domains for existing solution ideas. While improved performance in scholarly search engines can help scientists efficiently identify relevant advances in domains they may already be familiar with, it may fall short of helping them explore diverse ideas \textit{outside} such domains. In this paper we explore the design of systems aimed at augmenting the end-user ability in cross-domain exploration with flexible query specification. To this end, we develop an exploratory search system in which end-users can select a portion of text core to their interest from a paper abstract and retrieve papers that have a high similarity to the user-selected core aspect but differ in terms of domains. Furthermore, end-users can `zoom in' to specific domain clusters to retrieve more papers from them and understand nuanced differences within the clusters. Our case studies with scientists uncover opportunities and design implications for systems aimed at facilitating cross-domain exploration and inspiration.
translated by 谷歌翻译
面部识别网络通常展示相对于性别,Skintone等的敏感属性,适用于性别和Skintone,我们观察到网络的面积,网络参加属性的类别。这可能有助于偏见。在这种直觉上建立一种新的基于蒸馏的方法,称为蒸馏和去偏置(D&D),以实施网络以寻求类似的面部区域,而不管属性类别如何。在D&D中,我们从一个属性中培训一类图像的教师网络;例如轻的Skintone。然后从教师蒸馏信息,我们在剩余类别的图像上培训学生网络;例如,黑暗的skintone。特征级蒸馏损失约束学生网络以生成类似教师的表示。这允许学生网络参加所有属性类别的类似面部区域,并使其能够减少偏差。我们还提出了D&D的顶部的第二蒸馏步骤,称为D&D ++。对于D&D ++网络,我们将D&D网络的“未偏见”蒸馏成新的学生网络,D&D ++网络。我们在所有属性类别上培训新网络;例如,光明和黑暗的碳酸根。这有助于我们培训对属性偏差的网络,同时获得比D&D更高的面部验证性能。我们展示D&D ++优于在IJB-C数据集上减少性别和Skintone偏置的现有基线,同时获得比现有的对抗偏置方法更高的面部验证性能。我们评估我们所提出的方法对两个最先进的面部识别网络的有效性:Crystalface和Arcface。
translated by 谷歌翻译
基于模型的深度学习(MODL)依赖展开的算法是作为图像恢复的强大工具。在这项工作中,我们介绍了一种新颖的单调运营商学习框架,以克服与当前展开框架相关的一些挑战,包括高记忆成本,缺乏对扰动的鲁布利的保证,以及低的可解释性。与使用有限数量迭代的展开架构不同,我们使用深度均衡(DEQ)框架来迭代算法来收敛,并使用Jacobian迭代评估卷积神经网络块的梯度。这种方法显着降低了内存需求,促进了ModL算法的扩展到高维问题。我们将CNN限制为单调运算符,允许我们引入具有保证收敛性的算法和鲁棒性保证。我们在平行MRI的背景下展示了所提出的方案的效用。
translated by 谷歌翻译